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15 Abstract 

16 Heritable variation in gene expression is a source of evolutionary change and our understanding of the 

17 genetic basis of expression variation remains incomplete. Here, we dissected the genetic basis of 

18 transcriptional variation in a wild, outbreeding gymnosperm (Picea glauca) according to linked and 

19 unlinked genetic variants, their allele-specific (cis) and allele non-specific (trans) effects, and their 

20 phenotypic additivity. We used a novel plant system that is based on the analysis of segregating alleles of 

21 a single self-fertilized plant in haploid and diploid seed tissues. We measured transcript abundance and 

22 identified transcribed SNPs in 66 seeds with RNA-seq. Linked and unlinked genetic effects that 

23 influenced expression levels were abundant in the haploid megagametophyte tissue, influencing 48% and 

24 38% of analyzed genes, respectively. Analysis of these effects in diploid embryos revealed that while 

25 distant effects were acting in trans consistent with their hypothesized diffusible nature, local effects were 

26 associated with a complex mix of cis, trans and compensatory effects. Most cis effects were additive 

27 irrespective of their effect sizes, consistent with a hypothesis that they represent rate -limiting factors in 

28 transcript accumulation. We show that trans effects fulfilled a key prediction of Wright's physiological 

29 theory, in which variants with small effects tend to be additive and those with large effects tend to be 

30 dominant/recessive. Our haploid/diploid approach allows a comprehensive genetic dissection of 

3 1 expression variation and can be applied to a large number of wild plant species. 

32 Introduction 

33 Evolutionary changes are fueled by genetic variation that translates to molecular, cellular and 

34 physiological levels, upon which natural selection acts. Gene transcription plays a pivotal role in the 

35 transfer of genetic variation to phenotypes as it is the first functional step from information encoded in the 

36 genome (Fay and Wittkopp, 2008, Emerson and Li, 2010, Romero et al, 2012, Wittkopp and Kalay, 

37 2012). Variation in transcript abundance correlates with variation in protein translation (Muzzey et al., 

38 2014) and abundance (Albert et al, 2014), and translates into cellular phenotypes such as human cancer 

39 (Gregory and Cheung, 2014), variation in physiological traits such as chemical composition of the cell 
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40 wall in trees (Mac Kay et al., 1997) and in morphological phenotypes such as defensive pelvic apparatus in 

41 fish (Chan et al, 2010), for example. Given the critical role that gene transcription plays in determining 

42 phenotypes, investigating the genetic bases of expression variation is central to our understanding of 

43 evolution. 

44 Dissection of the genetic architecture of expression variation will help us to uncover the driving forces 

45 behind its emergence, persistence and divergence. As for any quantitative trait, determining the genetic 

46 architecture of expression variation includes identifying the number of loci involved and their genomic 

47 positions, the molecular mechanisms by which such loci influence expression and their homozygous and 

48 heterozygous effects on the expression phenotype (Mackay, 2001). The first of these goals can be 

49 achieved by mapping the positions of genetic effects influencing transcription (expression). For any given 

50 gene (the focal gene), genetic effects typically map to multiple linked and unlinked loci (Brem et al., 

51 2002), and are here referred to as local and distant effects, respectively. The molecular mechanisms by 

52 which local and distant effects influence expression can be inferred from the allele-specificity of these 

53 effects. Genetic effects representing diffusible molecules are expected to influence the expression of both 

54 alleles of a diploid cell in a similar fashion, while non-diffusible effects are likely to be allele-specific 

55 (Wittkopp et al., 2004). Allele-specific and non-specific effects are here referred to as cis and trans, 

56 respectively (Landry et al. 2005, Rockman and Kruglyak, 2006, Ronald et al., 2005, Skelly et al, 2009, 

57 Emerson and Li, 2010). In general, unlinked effects are assumed to influence focal gene expression levels 

58 in trans through diffusible molecular signals, such as protein or RNA (Wittkopp et al, 2004, Tirosh et al., 

59 2009, McManus et al, 2010, Emerson et al., 2010, Emerson and Li, 2010, Gruber et al., 2012, Goncalves 

60 et al, 2012, Meiklejohn et al., 2013, Coolon et al, 2014). Genetic effects that are linked with the focal 

61 gene are often interpreted as local sequence variants that alter the affinity of transcriptional regulators, 

62 mRNA stability or chromatin states in cis (Goncalves et al., 2012, Wittkopp and Kalay, 2012, Chang et 

63 al., 2013, Kilpinen et al., 2013, McVicker et al., 2013). Considerable deviation is however expected from 

64 these predictions, as local variants can act in trans through diffusible molecules (e.g. Jacob and Monod, 
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65 1961, Ronald et al., 2005) and distant effects in cis through direct interactions between chromosomes (e.g. 

66 Spilianakis et al, 2005, Kilpinen et al., 2013, Jin et ai, 2013, Battle et al, 2014). 

67 Persistence of expression variation within and expression divergence among populations depends on the 

68 efficiency of natural selection to purge detrimental and to select for beneficial variation. Genetic variation 

69 with beneficial additive or dominant phenotypes are thought to be a likely source of evolutionary change 

70 because they are exposed to natural selection as soon as they emerge in a population and can thus quickly 

71 reach fixation (Hartl et al, 1997). Time for new recessive beneficial mutations to reach fixation is 

72 significantly longer, and such alleles are expected to contribute relatively little to adaptive differences (i.e. 

73 Haldane's sieve, Haldane, 1927). In addition, even a low level of recessivity allows harmful variation to 

74 persist in populations at low frequencies under mutation selection balance (Hartl et al, 1997). 

75 Distinguishing between additive and dominant/recessive modes of inheritance therefore becomes a key in 

76 understanding the evolutionary dynamics of expression variation in cases such as outbreeding species 

77 where heterozygosity is common (McManus et al., 2010, Lemos et al., 2008). 

78 The relationships between the location, allele specificity and phenotypic additivity of genetic effects on 

79 expression are a key in determining the genetic bases of expression variation (Rockman and Kruglyak, 

80 2006). Haploid/diploid systems are a powerful tool for this task (Ronald et al, 2005). The haploid phase 

81 of these systems facilitate discovery of genetic effects because dominance is absent and effects should 

82 segregate 1:1. The allele-specificity and dominance relationships of the effects can be thereafter tested in 

83 diploids. Studies leveraging on a haploid/diploid approach have been relatively rare, and studies of the 

84 genetic architecture of expression variation have mostly investigated interspecific crosses (e.g. McManus 

85 et al., 2010), or predominantly inbreeding species (e.g. Zhang et al., 2011) where heterozygosity is 

86 limited. Other studies have considered new mutations instead of standing variation (Gruber et al., 2012), 

87 or excluded the possibility of self-regulating (local trans) effects (Meiklejohn et al., 2013). Overall, very 

88 few documented systems exist that permit the comparison of effect linkage, allele-specificity and 
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89 inheritance, and even fewer are the systems where this can be applied directly to individuals from 

90 outbreeding natural populations. 

91 We recently explored an outbreeding system in a multicellular eukaryote where genetic effects on gene 

92 expression can be identified in a haploid state (Verta et ah, 2013). This system is based on measuring gene 

93 expression in the megagametophyte tissue of the seed of gymnosperm plants. Megagametophytes are 

94 haploid products of meiosis and their genomes are maternally inherited (Fig. 1A, Williams, 2009). Here, 

95 we expand this biological system to include a diploid segregating tissue by self-fertilizing a white spruce 

96 (Picea glauca [Moench. Voss]) individual. In a self-cross, alleles of loci that are heterozygous in the 

97 mother tree segregate 1:1 in the megagametophytes and in 1:2:1 genotype combinations in the embryos 

98 (Fig. IB). This enables measurement of the impacts of the same genetic variants in haploid and diploid 

99 tissues. Another advantage of the system is that both the megagametophyte and embryo inherit identical 

100 maternal genomes (Fig. IB), facilitating genetic tracking of the progeny. We use this haploid/diploid 

101 system in conjunction with RNA-seq (Fig. 1C) to evaluate the intersect between the genomic location and 

102 allele specificity of genetic effects on gene expression as well as their additive versus dominant/recessive 

103 mode of inheritance (Fig. ID). 

104 Results 

105 Haploid comparison 

106 We performed transcriptome sequencing (RNA-seq) of 66 megagametophytes originating from a single 

107 wild mother tree. For simplicity, we will refer to megagametophytes as haploids in this report. Reads were 

108 aligned to white spruce transcribed gene models (representing unique genes, Rigault et al, 2011, Table 

109 SI). The average portion of uniquely aligned reads and the overall alignment rate were respectively ~50% 

110 and ~70%, on average. Comparison of read counts in samples that were replicated within and between 

111 sequencing lanes indicated that reproducibility was extremely high (Pearson's correlation 0.999 and 0.998, 

112 respectively), and that variation in read counts due to biological variation among samples was much larger 
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113 than variation assigned to technical sources (Fig. SI). We detected expression (defined as at least 90% of 

114 the samples having at least one read for a given gene) of 15,736 of the 27,720 genes represented in the 

115 white spruce gene catalogue (Rigault et al., 201 1). The number of expressed genes was in line with earlier 

116 studies based on microarrays that identified the megagametophyte as one of the spruce tissues with the 

117 largest number of expressed genes (Raherison et al, 2012). 

118 We first tested whether genetic sources of expression variation were in close genetic linkage with the 

119 expressed genes. We examined this by testing for expression differences between allele classes for each 

120 gene, identified based on SNPs in the RNA-seq reads. Sequence variation was observed in 6257 genes, 

121 each with one (28%) or multiple (72%) SNPs that segregated following Mendelian 1:1 expectations in the 

122 haploids (Fig. S2). Base calls on SNP positions were 96% identical across all samples. We assigned each 

123 haploid sample to one of the two alleles of each gene and compared the numbers of RNA-seq reads 

124 assigned to allele classes with a likelihood ratio test. Haploid samples were considered as biological 

125 replicates for each allelic class in this approach. Abundant local variation in gene expression was 

126 observed, influencing roughly half (48%) of the genes with SNPs (g<0.01, Table S2). These genes were 

127 assigned as being under local effects. This abundant local genetic variation indicates that genetic variants 

128 in the genomic proximity of focal genes have a large-scale influence on expression levels. 

129 Next, we examined cases where the genetic bases of expression variation were unlinked to the focal gene. 

130 We tested for association between the expression levels of all genes and genotypes of unlinked loci 

131 inferred from RNA-seq data (94,245,480 tests). Genes were grouped based on linkage between genotypes 

132 into 12 linkage blocks of nearly equal size, consistent with the haploid chromosome number of spruce, 

133 and a few smaller blocks (Fig. S3). Genes were left unordered within these blocks. Focal genes not 

134 containing SNPs could not be mapped, and any association with markers in other loci could not be clearly 

135 determined as local or distant. We therefore proceeded with independent analyses of the distant 

136 associations for these genes. 
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137 We observed that 38% of tested focal genes were influenced by distant effects (g<0.01, Table S2). Most of 

138 the distant effects were observed in a number of loci assigned to the same linkage block, which were 

139 counted as a single effect (seen as horizontal lines within linkage blocks in Fig. 2). Overall, distant effects 

140 were distributed randomly over linkage blocks (Supplemental text). Their relatively high abundance 

141 compared to local effects, which are more likely to be discovered due to a higher statistical power, points 

142 towards a complex and highly connected control of gene expression levels across linkage blocks. 

143 We compared the local and distant genetic effects to the cases where our previous investigation of the 

144 same individual tree indicated that expression variation in the focal gene was associated with a single 

145 major effects locus (Verta et ah, 2013). Here we analyzed 397 of the genes under major genetic effects, of 

146 which one hundred exhibited local effects and 262 exhibited distant effects. Taking both local and distant 

147 effects into account, 91% of the major genetic effects observed in our previous study could be identified 

148 here, i.e. over sixty times more than expected by chance alone. The majority of local and distant effects 

149 identified in this report explained expression variation in the progeny only partially and likely represented 

150 parts of a polygenic basis of expression variation (Supplemental text and Fig. S4). 

151 Haploid/diploid comparison 

152 We next examined whether the genetic effects identified in the haploids translated into genetic effects in 

153 the diploids (embryos). RNA-seq for 66 self-fertilized embryos were obtained from the same seeds used 

154 for haploid analysis (Fig. 1) and aligned to white spruce gene models (Rigault et al., 2011, Table SI), 

155 producing similarly high alignment rates. The expression of 15,859 genes was detected in the diploids, 

156 indicating that the embryos were at least as transcriptionally active as the megagametophytes. 

157 Genetic effects on gene expression can be specific to diploid tissues (e.g. Drost et al., 2010). We therefore 

158 separated genes into preferentially expressed in the haploids, diploids, or non -preferentially expressed. 

159 Most expression differences between tissues were of small magnitude; average expression levels in 

160 haploids were highly correlated with expression in diploids (Figs. 3A and S1L). A total of 2853 genes that 
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161 varied by log 2 fold change of two or higher were deemed as tissue preferential. The majority of genes 

162 were assigned as non -preferentially expressed, indicating that expression divergence was relatively minor 

163 and allow genetic effects to be compared across haploids and diploids (Figs. 3A and S1L). 

164 As a first step to compare genetic effects in haploids and diploids, we investigated whether local and 

165 distant effects could be observed in diploid homozygotes. We tested for differential expression between 

166 diploid genotype classes that were homozygous for the local and distant effects. Proportions of effects that 

167 were recapitulated between homozygous diploids ranged from 17% to 45% (g<0.01, Fig. 3B and Tables 

168 S3 and S4). Effects that explained a larger part of expression variation were more likely to be 

169 recapitulated in homozygotes (t-test P<2.2e-16). The identity of high and low expression alleles remained 

170 largely unchanged (82% for local effects and 86% for distant effects, Fig. 3B). In a small proportion of the 

171 cases (-15 %) the relative expression levels of the alleles were reversed in the diploids, a phenomenon 

172 that has been observed in haploid/diploid yeast (Ronald et al., 2005, Gruber et al., 2012). Inclusion of 

173 these effects in downstream analyses did not change our conclusions, but we nonetheless excluded these 

174 cases of expression reversals because they may result from complex genetic -ploidy interactions. The set of 

175 effects that were observed in homozygotes forms an important reference point in our further analyses 

176 where effects are compared across diploid genotypes. 

177 Allele specific expression 

178 Having established a set of genetic effects observed across ploidies, we focused on heterozygous diploids 

179 to examine the allele-specificity (cis versus trans) of local and distant effects. The distinction between cis 

180 and trans can be established by testing whether a genetic variant influences the expression of one, or two 

181 alleles of the focal gene in heterozygous individuals (Wittkopp et al., 2004). We first tested for cis effects 

182 between the two alleles in heterozygous diploids in order to establish a set of genes under cis effects to 

183 which haploid expression would be compared. A third (31%) of tested genes showed expression 

184 differences between the alleles (g<0.01, Table S3), indicating that cis effects were common in diploids. 
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185 Next, we investigated the relationship between locally linked variants and allele-specific (cis) effects. 

186 Technical differences in determining effects in haploids and diploids were taken into account by defining 

187 a set of local effects that were observable based on SNP counts (Methods and Supplemental text). We 

188 observed that 55% of the genes under local effects were also associated with cis effects (over four and half 

189 times the frequency expected by chance), while 75% of the genes that showed cis effects were assigned as 

190 under local effects (over six times the frequency expected by chance, Fig. 4A, 4B and 4C, Table S3). 

191 Relative allele expression levels remained unchanged in the large majority of the cases (96%, Fig. S6). 

192 These results suggested that allele-specific effects were more often due to local genetic variants than the 

193 contrary, where local genetic effects were acting in an allele-specific manner. 

194 Comparisons across diploid genotypes were used to identify local genetic variation that acted in trans. The 

195 expectation for trans effects was that homozygous diploids would have different expression levels, but no 

196 differential expression between alleles would be detected in heterozygotes (Wittkopp et ai, 2004, Fig. 

197 4E). Following this reasoning, any homozygote difference in genes under local effects that were not 

198 associated with cis effects were designated as local effects acting in trans. These were observed in 12% of 

199 the tested genes (Table S3). Having equal effects on both of the alleles in heterozygotes, these effects were 

200 candidate diffusible signals that were due to self-regulation, or independent genes situated in close linkage 

20 1 to the expressed gene. 

202 Comparison across homo- and heterozygote diploids facilitated further dissection of local effects. We first 

203 investigated whether local cis effects could be observed in homozygous diploids in addition to the 

204 heterozygotes. The majority (438 focal genes, 61% of tested) of local cis effects were also detected in 

205 homozygotes (Fig. 4D and 4E), while in 284 focal genes (39% of tested) cis differences were observed in 

206 heterozygotes only (Fig. 4E, 4F and S7). We verified that allelic expression differences were consistent 

207 between individual heterozygous diploids, i.e. the same allele was more strongly expressed in all 

208 heterozygotes (Fig. 4D). Previous studies in interspecific hybrids have reported cis effects specific to 

209 heterozygous genotypes, and interpreted such patterns as compensation of the cis effect by different trans 
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210 backgrounds in the homozygotes (e.g. Wittkopp et al., 2004, Landry et al, 2005, McManus et al., 2010). 

211 The observed patterns of heterozygote-specific cis effects were consistent across the progeny, i.e. largely 

212 independent of the genetic background that segregates separately from the focal gene. This suggests that 

213 either the compensatory factors were in very close linkage to the focal gene, or that these effects were 

214 caused by different mechanisms, such as interaction between the cis effect alleles, which we find unlikely. 

215 Finally, 165 focal genes under either of local cis or local trans effects changed signs between 

216 homozygotes and heterozygotes (18% of tested, Fig. 4E). This pattern is also compatible with either 

217 compensation by a closely linked factor the effect of which is larger than that of the focal effects (termed 

218 here as "overcompensation", Fig. 4F, Goncalves et al., 2012), or by an interaction between the cis effect 

219 alleles. Including these effects in downstream analyses did not change the conclusions. We however 

220 excluded the cases where local effects changed signs in order for our analysis to be linear across 

221 genotypes. 

222 We determined whether distant effects identified in the haploids were associated with cis or trans effects 

223 in the diploids. Distant effects on homozygous genes were assumed to act in trans because genes under cis 

224 effects are expected to be polymorphic (Zhang and Borevitz, 2009, McManus et al, 2010, Zhang et al., 

225 2011, Goncalves et al., 2012), yet no c/s-divergence in the focal gene RNA-seq reads could be observed. 

226 The hypothesis that distant variants act in cis was testable in embryos that were heterozygous for both the 

227 distant and the focal genes. Patterns consistent with distant cis effects were discovered in a frequency 

228 corresponding to our false discovery rate (45 of 3665 tested associations) and the segregating frequencies 

229 of these indicated that the association pairs were in fact linked in most cases. The overwhelming majority 

230 of distant effects therefore acted in trans, hence likely representing diffusible signals. 

231 Additive versus dominant/recessive mode of inheritance of expression variation 

232 Our third objective was to investigate the additive versus dominant/recessive mode of inheritance of 

233 genetic effects. We compared focal gene expression levels between diploids that were homo- and 

234 heterozygous for the genotypes associated with the genetic effects. For additive effects, statistically 
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235 significant expression differences were expected between heterozygotes and both of the homozygotes 

236 (Fig. 5). In contrast, dominance was expected to produce a statistical difference between the heterozygous 

237 genotype and only one of the homozygotes (the recessive genotype). 

238 Different patterns of inheritance were observed for effects acting in cis and trans. Expression phenotypes 

239 were mostly additive in genes under local cis effects (83% additive vs. 17% dominant/recessive, g<0.01), 

240 whereas those under local trans effects were more dominant/recessive (59% dominant/recessive vs. 10% 

241 additive, 31% unknown, g<0.01, Table S3 and Fig. 5). Expression variation followed a 

242 dominant/recessive inheritance in the large majority of distant effects as well, all of which acted in trans 

243 (g<0.01, Table S4 and Fig. 5). The observed clear distinction in the inheritance of local cis and trans 

244 effects suggested that the allele-specificity of the effects was important in determining their level of 

245 additivity. 

246 We next investigated the level of phenotypic masking by dominance relationships by calculating a ratio 

247 between the midpoint expression level separating homozygotes and the difference of the heterozygote 

248 expression level from this midpoint (D/A ratio, Gibson et ah, 2004, Fig. 6A). As expected, D/A ratios 

249 largely recapitulated the identity of additive and dominant/recessive effects in each category (Fig. 6B). In 

250 dominant/recessive effects, a relatively low difference between heterozygotes and the homozygote 

251 midpoint (absolute D/A values in majority between zero and one, Fig. 6B) suggested that phenotypic 

252 masking by dominance relationships was not complete. Clearest distinction between dominance and 

253 additivity in expression was observed between distant trans effects influencing non-mapped focal genes 

254 and local cis effects, likely because these represented the classes with most observations (Fig. 6B). 

255 Next, we examined the relationship between homozygous genetic effect size (see Methods and 

256 Supplementary text) and the level of phenotypic masking by dominance. Wright's physiological theory of 

257 dominance predicts that if the phenotype is the result of many interdependent steps, phenotypic masking 

258 by dominance should become stronger in variants associated with larger effects (Wright, 1934, Kacser and 

259 Burns, 1981, Phadnis and Fry, 2005). Overall, expression variation seems to follow predictions of the 

11 
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260 physiological theory of dominance: Distant trans effects that explained a larger proportion of expression 

261 variation were associated with D/A ratios closer to 1 or -1 (P<2e-4, Fig. 6C), indicating that phenotypic 

262 masking by dominance grew stronger with larger effects. In addition, distant trans effects classified as 

263 additive typically explained a small portion of expression variation, while effects classified as 

264 dominant/recessive typically had larger effects (seen as a gradient of red and black data points in Fig. 6C). 

265 Some effects of small size were classified as dominant/recessive, suggesting that a minority of effects may 

266 diverge from the predictions of Wright's physiological theory (see: Kondrashov and Koonin, 2004, 

267 Agrawal and Whitlock, 2011). Interestingly, correlation between D/A ratio and effect size was nearly 

268 symmetric for D/A ratio values below and above zero. This indicates that phenotypic response to genetic 

269 variation is similar irrespective of whether heterozygote expression was closer to high or low expression 

270 homozygote (the phenotype response curve in Fig. 6D would be inverted for negative D/A ratios). 

271 Wright's theory also predicts that if the phenotype depends on a single rate -limiting factor, the level of 

272 dominance should be independent on the effect size or negatively correlated due to increased statistical 

273 power in larger effects (Fig. 6C). Such result was observed for cis effects; increasing effect size 

274 corresponded to stronger additivity of phenotypes (P<3e-3, Fig. 6C). Also local trans effects showed a 

275 significant correlation of stronger additivity with larger effects (,P=3e-3, Fig. S8). These results are most 

276 consistent with a linear response curve between genotype and phenotype with one rate-limiting factor 

277 determining the phenotype (Wright, 1934, Kacser and Burns, 1981, Fig. 6C). 

278 Discussion 

279 We dissected the genetic bases of expression variation in segregating haploid and diploid progeny of an 

280 outbreeding, wild gymnosperm plant. We analyzed the intersect between linked/unlinked genetic effects 

281 on gene expression and their allele-specific (cis) versus allele non-specific effects (trans), and additive 

282 versus dominant/recessive mode of inheritance in heterozygotes. We discovered abundant genetic 

283 variation in gene expression associated with local (linked) genetic variants (48% of genes) as well as 

284 distant (unlinked) variants (38%). The sets of genes under local and distant genetic effects overlapped; 
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285 24% of genes under local effects were also under distant effects, highlighting the polygenic nature of 

286 expression variation. We had previously used the haploid component of the same approach to identify 

287 segregating expression variation that was consistent with underlying single major effect loci in the same 

288 individual we studied here (Verta et al., 2013). Most of the major genetic effects on expression variation 

289 was re -discovered in this study, along with many more genetic effects with typically smaller effects on 

290 expression variation (Supplementary text), demonstrating the high reproducibility of the approach for 

29 1 studying the genetic bases of expression variation. 

292 Effect of linkage versus allele specificity 

293 We observed that local expression variation was associated with a complex mix of cis, trans and 

294 compensatory cis-trans effects. Most local effects acted in cis (53%), consistent with earlier studies 

295 (Ronald et al, 2005, Pickrell et al, 2010, Cheung et al, 2010, Zhang et al, 2011, Gruber et al, 2012, 

296 Massouras et al, 2012, Meiklejohn et al, 2013, Lagarrigue et al, 2013, Battle et al, 2014). These 

297 observations are in line with the view that a significant part of local effects represent allelic sequence or 

298 chromatin variation that alter the rate of transcription or mRNA accumulation in general (Wittkopp and 

299 Kalay, 2012, Shlyueva et al, 2014), but see (Gerstein et al, 2012). However, our results indicate that a 

300 non-negligible portion (up to 12% of tested) of local variation acted in trans. We observed that the 

30 1 proportion of local effects acting in trans increased to a third when we investigated local effects that had 

302 effects across all diploid genotype combinations (because some local cis effects were not observed in 

303 homozygotes due to compensatory trans effects, Fig. 4E). Although rarely dissected in a systematic 

304 fashion, indications that trans acting effects might represent a significant portion of local variation have 

305 been reported (Ronald et al, 2005, Cheung et al, 2010, Zhang et al, 2011, Lagarrigue et al, 2013). 

306 Taking into account the resolution of our study (66 segregant progeny), the factors associated with local 

307 trans regulation could be independent of the focal gene but situated on the same chromosome. It is also 

308 possible that some local trans effects influence self-regulation of transcription, such as in the well- 

309 documented case of the yeast gene AMN1, where the gene product regulates the transcription of its locus 
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310 (Ronald et al., 2005). Further fine-mapping would help to determine whether the local trans effects 

311 identified here were caused by variation in the gene itself, or independent genes in linkage. 

312 A complex architecture likely underlies most local effects. In a considerable portion (39% of tested) of the 

313 local cis effects, expression was allele-specific in heterozygotes, but no difference between homozygous 

314 genotypes was observed (Fig. 4E). This pattern is most consistent with compensation between cis and 

315 trans effects; focal effects are cancelled in homozygotes but not in heterozygotes where the compensatory 

316 factors cancel each other (Fig. 4F, Landry et al., 2005). In addition, 165 (18% of tested) focal genes 

317 showed patterns consistent with the compensatory effect being stronger than the focal effect, leading to a 

318 change in signs between homo- and heterozygotes (Fig. 4F, Goncalves et al, 2012). Our results show that 

319 the putative compensatory factors must be in very close linkage to the focal gene because their effects are 

320 observed in a segregating progeny. 

321 Overall, compensation seems to be a common feature of transcriptional networks and suggests that 

322 expression variation is (or has been) under stabilizing selection (Wittkopp et al., 2004, Landry et al., 2005, 

323 Tirosh et al., 2009, McManus et al., 2010, Goncalves et al, 2012). Stabilizing selection allows mutation 

324 of small effects that increase and decrease expression levels in opposing directions to accumulate in 

325 populations. Recombination between compensatory factors may give rise to combination of genotypes that 

326 show extreme phenotypes, which represent a form of genetic load (see: Charlesworth 2013). Although 

327 compensation seems to be widespread across organisms, it has remained unknown if the compensatory 

328 factors were dispersed over chromosomes or not because earlier studies have not measured linkage. Close 

329 linkage between the compensatory and focal effects such as seen here could be favored by natural 

330 selection as recombination would tend to break compensatory associations, thus unleashing negative 

33 1 expression variation or leading to other maladaptive expression traits. The means by which closely linked 

332 compensatory effects evolve remains to be determined, but could involve either preferential establishment 

333 of compensatory effects near the focal effect locus, or "migration" of effects to close linkage by means of 

334 genomic rearrangements (see also: Yeaman, 2013). 
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335 Evolution of expression variation 

336 Previous studies have established that expression divergence due to cis variation is more common between 

337 species, compared to the more widespread trans variation observed within species (Wittkopp et al., 2008, 

338 Emerson et al., 2010, Schaefke et ah, 2013). These patterns are likely influenced by dominance 

339 relationships between the underlying allelic variants; populations may preferentially accumulate recessive 

340 effects because they are less likely to reach fixation (Lemos et al., 2008). Consistently, trans variation 

341 seems to be much more recessive than cis variation (this study, Lemos et al., 2008, McManus et al, 2010, 

342 Emerson et al., 2010, Gruber et al., 2012, Schaefke et al, 2013). However, because previous studies have 

343 not distinguished between local/distant and cisltrans effects (but see Gruber et al., 2012), it has remained 

344 unclear whether effect linkage or allele-specificity underlie these differences in inheritance. 

345 Our results indicate that the level of dominance in expression phenotypes seem to be dependent on the 

346 allele specificity of genetic effects, more or less independently of their linkage to the focal gene. First, the 

347 majority of distant effects, all acting in trans, were dominant/recessive (Fig. 6B). Second, expression 

348 phenotypes under local cis effects were mostly additive (Fig. 6B). Third, a marked difference was 

349 observed in the inheritance of expression phenotypes under local trans effects versus those under cis 

350 effects. In a majority of local trans effects, heterozygote expression was not statistically different from one 

35 1 of the homozygotes, consistent with dominance by one of the alleles. The level of phenotypic masking, as 

352 measured by the D/A ratio, indicated that dominance was not complete (Fig. S8A). Previous studies 

353 indicate that such partial dominance is a common feature of expression variation (Gibson et al., 2004, 

354 Lemos et al., 2008). Some uncertainty remain in these results, as some statistically dominant/recessive 

355 effects had D/A ratios close to zero, indicating a conflict between the two measures for dominance (e.g. 

356 Fig. S8A). We favor interpreting the results based on statistical expression difference rather that the D/A 

357 ratio because the D/A ratio does not take into account variance within genotype classes and is thus more 

358 limited in terms of interpretation (Gibson et al., 2004). Here also, further dissection of local expression 

359 variation with higher resolution would be informative. 
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360 Our results suggest that further refinement in the assumed evolutionary implications of local variation can 

361 be gained. Gruber et al. (Gruber et al., 2012) proposed that due to their larger phenotypic effects and more 

362 additive nature in diploids, local (in majority cis) genetic variation are under more efficient natural 

363 selection and thus either reach fixation, or are eliminated quicker than distant (in majority trans) variation, 

364 leading to local effects being preferentially observed between species relative to within species (see also: 

365 Lemos et al., 2008). Our results suggest that preferential fixation of effects, if dependent on the mode of 

366 inheritance, should not apply to all local variation but depend on their allele specificity. 

367 Nature of cis and trans effects 

368 The patterns of inheritance reported in the present study are consistent with trans-acting variants 

369 influencing the focal gene via a diffusible molecule. In contrast to cis effects where each allele is 

370 presumably transcribed in an independent quantity, diploid focal genes under trans effects are exposed to 

371 both effect alleles as well as other factors influencing their transcription (Wittkopp et al., 2004). In this 

372 context, we propose that the widespread dominance/recessivity of trans acting variation could emerge 

373 from the complex nature of gene transcription in a similar way as initially described by Sewall Wright in 

374 the form of the physiological theory of dominance (Wright, 1934), and later much elaborated by Kacser 

375 and Burns (1981) as the metabolic theory. According to this scenario, diffusible trans effects would 

376 influence expression through their complementary functions in association with the multitude of other 

377 proteins and RNAs involved in transcription. This might take place at chromosome locations containing 

378 regulatory sequences in the form of combinatory binding of proteins (Gerstein et al., 2012), in the form of 

379 interactions within a transcription factor - co-factor complex (Slattery et al., 201 1), or interaction within a 

380 transcription factor protein complex, for example. Because most trans effects would represent non rate- 

381 limiting factors in gene transcription in this scenario, their variation would generally follow a relationship 

382 of diminishing returns where only the complete absence of a factor due to a homozygous genotype would 

383 lead to an observable expression difference. This haplosufficient characteristic was present in the majority 

384 trans effects, whether local or distant relative to the focal gene (Fig. 5). 
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385 Further, we show that distant trans effects fulfill a key prediction of the physiological and metabolic 

386 theories (Wright, 1934, Kacser and Burns, 1981, Phadnis and Fry, 2005), which is that dominance 

387 becomes stronger with larger effects (Fig. 6C). This characteristic follows from the nature of the reactions 

388 in question (in our case, gene transcription), which contain multiple interdependent steps (like the 

389 transcriptional protein complex and interacting enhancer or repressor proteins, for example). In the theory 

390 of Kacser and Burns (1981), these steps depend on one another because they share the same pool of 

391 substrates and products, which tends to buffer the system to variation in any one component. It is 

392 reasonable to assume that this could also apply to gene transcription not only in the form of substrates, but 

393 also as interaction states between different proteins. When the system contains many steps, the response 

394 between phenotype and genotype takes the form of a hyperbolic curve of diminishing returns (figure 4 in 

395 Kacser and Burns, 1981). The impact of mutation in any one of the constituents on the resulting phenotype 

396 will depend on their importance for the functioning of the whole system. Mutations in factors that have 

397 small effects on the total flux through a reaction when they are homozygous will be additive, while 

398 mutations in components that are essential for flux will be dominant/recessive (Fig. 6C, Kacser and Burns, 

399 1981). 

400 Overall, distant trans effects of small magnitude were more additive than those that explained a large 

401 proportion of expression variation, consistent with the prediction of the metabolic theory. Our results 

402 show that unlike trans effects, larger cis effects were more additive. They therefore seem to represent rate- 

403 limiting factors in gene transcription that could correspond to their structural instead of diffusible nature. 

404 By representing differences in the DNA sequence, cis effects might alter the overall affinity of the whole 

405 panoply of transcriptional regulators to the DNA strand and hence yield a largely additive effect in 

406 heterozygotes irrespective of their effect sizes. 

407 In summary, our systematic dissection of the relationships between linkage, allele-specificity and 

408 dominance in expression variation represents a significant step towards a better understanding of the role 

409 of expression variation in past, present and future evolution. The approach developed in this report can be 
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410 used to investigate expression divergence between and within a wide range of wild plant species. 

411 Haploid/diploid analyses akin to the one reported here are becoming increasingly feasible through the use 

412 of doubled haploid lines of angiosperms for example (Seymour et al., 2012), allowing future investigation 

413 of the evolutionary modes discovered here in other phyla. 

414 Materials and Methods 

415 A white spruce individual (known as 77111) growing at Cap Tourmente, Quebec, Canada (+47° 4' 1.28", 

416 -70° 49' 4.03") was self-fertilized in May 2010 by Jean-Marc Montminy of Canadian Ministry of Natural 

417 Resources. Pollen of 771 1 1 was obtained from Dr. Jean Beaulieu of Canadian Forest Service. Cones were 

418 collected on August 18 th 2010, and stored in paper bags in room temperature and protected from light until 

419 they were used for the study. The seeds were stratified for 4 wk and germination process was initialized as 

420 described in Verta et al. 2013 (empty seeds and seeds where embryos had obvious morphological 

421 anomalies were discarded, see Supplementary text). Seeds were dissected under a microscope. 

422 Megagametophyte and embryo tissues were flash-frozen on liquid nitrogen and stored in -80 degrees C 

423 until they were used for RNA extraction. 

424 RNA extraction and quality control was carried out as described in Verta et al. 2013. Individual RNA-seq 

425 libraries were constructed from each mRNA sample with the TruSeq RNA sample preparation kit 

426 (Invitrogen), with minor modifications to the manufacturer's protocol. The indexed sample libraries were 

427 combined to single lanes in equal amounts in batches of 24 samples for megagametophytes and 23 

428 samples for embryos. A total of seven lanes of 100 bp Hlumina HiSeq-2000 paired-end sequencing of the 

429 pooled libraries was carried out at the Genome Quebec Innovation Centre in Montreal, Canada. One 

430 pooled sample was run per lane. One particular pool was sequenced on two lanes and the reads were 

431 combined. The number of reads per lane is given in table SI. Read quality was verified with the FastQC 

432 software (www.bioinformatics.babraham.ac.uk/projects/fastqc/) and Hlumina sequencing adapters were 

433 trimmed from the reads using TrimGalore (www.bioinformatics.babraham.ac.uk/projects/trim_galore/). 
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434 Alignment of RNA-seq reads 

435 Reads from each individual sample were aligned to white spruce gene models representing single genes 

436 (Rigault et al., 2011) using Bowtie 2 (Langmead and Salzberg 2012) in the local alignment mode with 

437 default parameters. The resulting .sam files were transformed into .bam format, sorted and likely PCR 

438 duplicates were removed with the SAMtools (Li et al., 2009) functions view, sort and dupreni, after which 

439 the files were indexed with the function index. 

440 Expression quantification per transcript 

441 Per transcript read counts were generated using the HTSeq software (Anders et al., 2014) with the 

442 interception-strict read counting mode. Reads having a mapping quality less than 30 were not considered. 

443 Tissue-preferential expression was analyzed in R (version 2.15, R Core Team, 2013) with the package 

444 'edgeR' (Robinson et al., 2010). Samples within in each tissue type were considered as biological 

445 replicates. Normalization factors, and common and tag-wise dispersion estimates were calculated with the 

446 edgeR functions calcN ormFactors(... method— 'TMM'), estimateGLMcommonDispersion and 

447 estimateGLMtagwiseDispersion. Total library size for each sample was used in the calculation of the 

448 offset parameter, as is default in edgeR. Consequently, gene-wise expression differences between the 

449 haploid and diploid samples were summarized with the glmLRT. 

450 SNP discovery in megagametophytes 

451 Transcribed SNPs were identified individually for each megagametophyte sample with the FreeBayes 

452 software (version 0.9.8, Marth, 2012), specifying ploidy as 1 and enabling the standard filters. The 

453 mapping quality, read coverage and segregation of each putative variant was then analyzed with custom 

454 Python (version 2.6) and Biopython (Cock et al., 2009) scripts. A final set of SNPs were selected based on 

455 the following requirements: i) mean mapping score of reads overlapping the SNP position was higher than 

456 30 in at least 90% of the samples where the variant was observed, ii) the nucleotide position of the variant 

457 was covered by at least one read in 90% of all the samples, iii) the frequency of the two alternative 
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458 nucleotides were within the 95% IC of Mendelian segregation, i.e. between 25 and 41 observations when 

459 N=66, iv) at least 90% of the SNPs fulfilling the criteria (i-iii) for a given transcript were linked in 90% of 

460 the samples, i.e. if the transcript had SNPs that were not linked in 90% of the samples (which could be the 

461 case if reads from two similar transcripts map to the same reference sequence) none of the SNPs were 

462 considered for that gene. Reproducibility of base calls in the variant positions were investigated by 

463 comparing all aligned nucleotides overlapping a variant position within each sample. 

464 Embryo genotypes 

465 Variants for embryo samples were called with FreeBayes, specifying ploidy as 2 and enabling the standard 

466 filters. The algorithm was given the genotype (specified as the .vcf file) of the associated 

467 megagametophyte sample as a priori information (option -variant-input). Custom Python scripts were 

468 then used to extract the genotype and the read coverage of each segregating SNP, as identified in the 

469 megagametophytes. Self-fertilization was validated by comparing embryo genotypes over the complete set 

470 of segregating SNPs against megagametophytes from the same seed. 

471 Association testing in megagametophytes 

472 Local effects - Each gene was identified as one of the two maternal genotypes based on transcribed SNPs. 

473 Differential expression was tested considering samples as biological replicates of one of the two alleles 

474 and testing expression difference between the two groups. Two complementary approaches were used. 

475 First, a single SNP was used to assign samples to alleles. Then, gene-wise counts generated with HTSeq 

476 were used in the testing of expression differences between alleles with the R package edgeR as described 

477 above concerning tissue-preferential expression. Normalization factors, dispersions and offsets were 

478 calculated based on all reads in each sample. Q-values were calculated from P-values using the R package 

479 'qvalue' (Storey, 2002). 
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480 In the second approach, gene expression level for each gene/sample was represented as the sum of read 

481 coverages over all heterozygous nucleotide positions (one if the gene had only one segregating SNP, or 

482 many if the gene contained a set of linked heterozygous SNPs). A minimum distance of 100 bp between 

483 SNPs within a given gene was required, corresponding to typical read length. Expression differences 

484 between allele classes were tested as above using edgeR. Offset parameter for the edgeR model was 

485 calculated sample- wise as the logarithm of the sums of alternative and reference allele read counts 

486 observed for each sample. 

487 A bias towards higher expression of the allele represented by the reference has been reported in numerous 

488 studies where reads from a heterozygous individual are mapped to a reference (Stevenson et al, 2013). 

489 Our approach takes this possible source of bias into account in two ways. First, read counts are 

490 summarized over linked SNPs within a gene, and because the two alleles (which are unrelated to the 

491 genotypes used in construction of the white spruce transcriptome reference) are anticipated to contain 

492 alternative and reference nucleotides in an equally frequency, no bias is expected. Second, we took into 

493 account the total number of reads overlapping the alternative and reference nucleotides, and used this as a 

494 baseline (the offset parameter) with which the abundances of the expression level of each allele were 

495 corrected. In the cases where only one allele was different for only a single SNP with respect of the 

496 reference, higher expression level was observed equally in the alleles represented by the alternative and 

497 reference nucleotides (121 and 108 cases, respectively). 

498 Distant effects - Expression level of the focal gene was represented by the sum of reads overlapping the 

499 gene model, as the focal gene genotype was considered to have no impact in the case of distant effects. 

500 First, low-resolution linkage blocks were defined with the R package 'qtT (Broman et al., 2003), using the 

501 function formLinkageGroups with the options max.rf-0.35 and min.lod-6. Genotypes of all other genes 

502 were initially tested against focal gene expression in the cases where both the focal gene, and the variant 

503 against which it was tested were covered by at least one read in more than 90% of the samples. With these 

504 results in hand, we determined the genetic distance over which 95% of the association signals from local 
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505 effects were detectable on other loci (30 cM, <7-threshold>0.01, Fig. S9). Accordingly, the final set of 

506 tested distant loci was filtered to include only loci whose genetic distance to the focal locus was at least 30 

507 cM. 

508 Association testing in embryos 

509 Cis effects - Testing for cis effects was performed for genes in which samples heterozygous for the two 

510 focal alleles were observed in a Mendelian frequency, i.e. between 25 and 41 samples when N=66. 

511 Expression level for each allele was represented as sums of read counts over all heterozygous, linked 

512 variants as in the megagametophytes. Each allele in each sample were considered as replicate measures, 

513 and the difference between allele classes was tested with edgeR as described for the megagametophytes. 

514 This procedure takes into account sample-wise variation in total library size and potential reference 

515 mapping bias as described for the megagametophytes. 

516 Effects in homozygous diploids - Embryo genotypes were called from transcribed SNPs with custom 

517 Python scripts, taking into account the genotype of the associated megagametophyte. Testing for 

518 differential expression between homozygous genotypes was performed for genes in which the observed 

519 genotypes segregated in a Mendelian 1:2:1 ratio, that is, heterozygotes were observed in a frequency from 

520 25 to 41 and homozygotes in a frequency from 10 to 24. Testing procedure was identical to that described 

521 above concerning the megagametophytes, i.e. the edgeR package was used to test gene -wise counts and 

522 normalization factors, dispersions and offsets were calculated based on all aligned reads. 

523 Inheritance - An edgeR model for gene-wise expression levels in focal gene according to local or distant 

524 genotypes was constructed as described above in the case of homozygote difference. Contrasts between 

525 heterozygous genotype versus the two homozygous genotypes were tested with the makeContrasts 

526 function of edgeR. Significant contrast between heterozygotes and both homozygous classes was 

527 designated as evidence for additivity. A single significant contrast was designated as evidence for 

528 dominance of the non-significant genotype. If neither contrast were significant the inheritance of the 



22 



Downloaded from http://biorxiv.org/on September 18, 2014 

529 distant effect was designated as "unknown". For distant effects, we identified a non-redundant set of 

530 association pairs by selecting one additive or dominant/recessive association per linkage block per focal 

53 1 gene (lowest P-value selected). 

532 Effect size - Strength of association in likelihood-based models can be measured by a coefficient of 

533 correlation based on the likelihood ratio, the Rlr 2 (Supplemental text, Magee, 1990, Nagelkerke, 1991). 

534 Similar to the standard coefficient of determination in association studies (e.g. Laurie et al., 2004), this 

535 parameter can be broadly interpreted as the proportion of explained variation in the phenotype accounted 

536 by a genetic effect (Sun et al., 2010). Use of R LR 2 as a measure of effect size follows from the fact that 

537 allelic variation in a genetic variant that is strongly associated with expression variation changes strongly 

538 the expression level (Rockman and Kruglyak, 2006). Consistent with this, high values of R LR 2 

539 corresponded to genetic effects that were associated with single major effect loci and median R LR 2 closely 

540 corresponded to the median effect size previously estimated for eQTL in a yeast cross (Supplemental text, 

54 1 Brem and Kruglyak, 2005). 

542 R LR 2 was calculated from the likelihood ratio following (Magee, 1990); 

543 R LR = 1- exp(-LR/n) 

544 where LR is the likelihood ratio of the model (provided by the glniLRT function of edgeR) and n is the 

545 sample size. In haploids, this model included all samples assigned to one or the other allele. In diploids, 

546 the model included only homozygous genotypes. 

547 D/A ratio - D/A ratio was calculated as defined in Gibson et al., (2004). Relationship between Rlr 2 and 

548 the D/A ratio was tested with the R base function Im, considering D/A ratio as the response variable and 

549 Rlr 2 as the linear predictor. 
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550 Data access 

55 1 Scripts used for the analysis, variant calls (.vcf files) and gene expression tables are available from the 

552 Dryad database (Dryad allows data submission only following manuscript acceptance). Raw sequence 

553 data is available through the NCBI Sequence Read Archive (submission ongoing). 
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714 Figure legends 

715 Figure 1. Spruce haploid/diploid system. A) Seed development. Red and blue colors represent the two 

716 alleles at a given locus. B) Segregation of alleles in megagametophytes and embryos in a self -cross where 

717 the pollen is also the meiotic product from the mother tree. There are two possible haploid genotypes 

718 (megagametophytes) and three possible diploid genotypes (embryos) for each heterozygous locus in the 

719 mother's genome. C) RNA-seq in haploids and diploids facilitates the measurement of allele expression 

720 levels (number of reads is proportional to expression level of the allele) and transcribed SNPs. D) Genetic 

721 dissection of expression variation indicating tissue (In or 2n) used for each analysis step in the study. 

722 Figure 2. Mapping local and distant effects. A) Local effects were defined as genetically linked to the 

723 expressed gene, while distant effects were unlinked. B) Distribution of focal genes, according to detected 

724 SNPs and genetic effects. C) Position of local and distant effect (red and blue points, respectively) loci 
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725 versus positions of the focal gene. For distant effects, only associations that involve mapped focal genes 

726 are shown. Twelve largest linkage blocks (LB 1 -LB 12) are delimited with gray lines. Data points outside 

727 these represent smaller linkage blocks and focal genes not assigned to linkage blocks (UND). 

728 Figure 3. Genetic effects on gene expression in haploids and diploids. A) Tissue -preferential 

729 expression between haploid and diploid tissues. Log 2 FC of two or more was used to designate tissue - 

730 preferential expression (19.5% of genes). B) Effects that were observed between homozygous diploids. 

731 Log fold change of expression levels between allele classes are plotted in haploids (In x-axis) against that 

732 in diploids that were homozygous for the given effect (2n y-axis). Bars (right) represent the proportion of 

733 tested local and distant effects (discovered in haploids) that were observed between diploids homozygous 

734 for the effects (see Tables S3 and S4). Coefficient of determination (R 2 ) of linear models (expression in 

735 diploids as dependent variable and expression in haploids as linear predictor) for each effects category are 

736 given above the scatterplot. Distant m. (green squares); distant effects influencing mapped focal genes, 

737 distant n.m. (blue circles); distant effects influencing non-mapped focal genes, local (red angled squares), 

738 grey symbols; effects that change signs between haploids and diploids and that were excluded from 

739 downstream analysis. 

740 Figure 4. Local effects in cis and trans. A) Allelic expression differences due to local effects observed in 

741 haploids were compared to allele expression differences in heterozygous diploids. B) Tissue preferential 

742 expression levels in all genes (white), genes under local effects in haploids (grey) and genes under cis 

743 effects in diploids (black). C) Close-up of tissue-preferential expression in genes under local (grey) and 

744 cis effects (black). Threshold of tissue -preferential expression correspond to log 2 FC values -2 and 2. D) 

745 Comparison of allele counts within heterozygous diploids. Each data point represents a gene under local 

746 cis effect. Horizontal and vertical lines intersect at the mean value of allele expression levels and show the 

747 standard error of the mean counts of each allele. Blue: genes under compensatory effects, red: genes 

748 where no compensation was observed. Notice there is no difference in the distribution of blue and red data 

749 points and the standard errors never cross the diagonal line, which marks equal expression of the two 
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750 alleles. Pie-chart, proportions of local cis effects with expression differences between homozygotes (no 

751 compensation, no comp.) and those where no homozygote difference was observed (compensation, 

752 comp.). E) Allele expression levels (log 2 FC) between homozygous diploids (x-axis) and within 

753 heterozygous diploids (y-axis). Genes under local cis effects with and without compensation are 

754 represented as blue circles and red squares, respectively. Local trans effects, green triangles. Grey 

755 symbols correspond to effects that change signs between homozygotes and heterozygotes 

756 (overcompensation) and that were excluded from downstream analyses. Including these cases in 

757 downstream analyses did not change the conclusions. F) Schematic of compensation and 

758 overcompensation. Locus A represent the focal genes the expression of which is being measured and locus 

759 B the compensatory effect in linkage. Plus and minus signs represent signs of effects on focal gene 

760 expression. 

761 Figure 5. Additive versus dominant/recessive modes of inheritance of local and distant effects. 

762 Cartoon tables (left) of different possible categories of allele specificity and inheritance. Only cases of 

763 complete dominance (gene expression in heterozygote is identical to homozygous dominant) and 

764 additivity (gene expression in heterozygote is exactly at midpoint between homozygotes) are shown for 

765 clarity. mRNA illustrations (distant effects) and letters corresponding to nucleotides (local effects) depict 

766 allelic expression levels in different embryo genotypes. Pie-charts (right) show the proportions of local 

767 and distant effects observed in haploid and all diploid genotypes under each category. Proportions of each 

768 effect category are calculated from tables S3 and S4. For local effects, the relative proportions of cis and 

769 trans observable between all three genotype classes of diploids are calculated as ratio of column 4 on 

770 column 5 of table S3. Proportions of different modes of inheritance within cis and trans effects are 

771 calculated from percentages in columns 6-9 of table S3. Relative frequencies of local cis and trans effects 

772 are different from columns 3 and 4 of table S3 because some local cis effects were observed only in 

773 heterozygotes and did not correspond to differences between homozygous diploid genotypes. "Distant m." 
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774 and "distant n.m." effects represent non-redundant effects that influence mapped and non-mapped focal 

775 genes, respectively, d.r.; dominant/recessive, add.; additive, n.d.; non-determined. 

776 Figure 6. Phenotypic masking of heterozygote expression by dominance relationships and their 

777 correlation with effect size. A) D/A ratio. Distance between genotypes (bb, bB and BB) represents 

778 expression difference. D/A ratios close to zero indicate that the mean of heterozygote expression is 

779 halfway between mean homozygote expression levels (complete additivity) while ratios close to 1 or -1 

780 indicate complete phenotypic dominance by one allele. Ratios beyond 1 and -1 indicate over- and under 

78 1 dominance, respectively. B) Distribution of D/A ratio in different effects. Dominant/recessive and additive 

782 associations are represented as black and red bars, respectively. C) D/A ratio of dominant/recessive (black 

783 open circles) and additive (red open circles) expression phenotypes (y-axis) is correlated with the strength 

784 of association (Rlr 2 ), which we use here as a relative measure of the proportion of explained variation by 

785 different genetic effects (x-axis, see Supplemental text and Methods, Rockman and Kruglyak, 2006). Grey 

786 points represent over- and underdominant effects which are not included in the theoretical predictions and 

787 therefore excluded from regression analysis. Pie-charts represent the proportions of dominant/recessive 

788 and additive effects in each category. Lines represent linear model fit between D/A ratio and strength of 

789 association for each category, "b" represents the slope of the linear regression model and describes the 

790 tendency of D/A ratio to increase with increasingly large effects in distant trans effects and to decrease in 

791 local cis effects. D) The physiological theory of dominance posits that recessivity of phenotypes is a 

792 consequence of a diminishing returns relationship between genotype and phenotype (Wright, 1934). 

793 According to Wright (1934), the hyperbolic response curve between genotype and phenotype is due to the 

794 fact that the individual gene product is not the rate -limiting factor in a given reaction. Kacser and Burns 

795 (1981) later showed that enzymatic reactions that involve multiple steps are robust to changes in the 

796 activity of the constituent factors, which is at the origin of the hyperbolic response curve. Mutations in 

797 factors that have larger homozygote differences correspond to more recessive phenotypes as the response 

798 curve between genotype and phenotype falls off the near linear portion, as was observed for trans effects. 
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799 Cis effects were relatively robust to changes in effect size, which was consistent with a linear response 

800 curve with one rate-limiting factor (Kacser and Burns, 1981). Cartoon adapted from Wright (1934) and 

80 1 Phadnis and Fry (2005). 
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